fix(bots): skip bot upsert when nothing changed to stop team-strip + reindex loop on boot by joaopamaral · Pull Request #28128 · open-metadata/OpenMetadata

joaopamaral · 2026-05-14T20:21:20Z

Summary

BotResource.initialize() runs UserUtil.addOrUpdateBotUser(user) for every bot on every OM boot. The in-memory User built by UserUtil.user(...) does not have the teams field populated, so the PUT path through userRepository.createOrUpdate -> UserUpdater.entitySpecificUpdate runs updateTeams(original, updated) with original.teams = [Organization] (or the bot's real stored teams) and updated.teams = null. updateTeams then executes:

deleteTo(original.getId(), USER, Relationship.HAS, Entity.TEAM);
assignTeams(updated, updated.getTeams()); // updated.getTeams() == null

…which strips every stored team membership the bot had, bumps the user version, and triggers an Elasticsearch reindex of both the user and each affected team. With several bots this fires on every restart and produces the reindex storm we observed in production logs:

storeNewVersion called for entity: user <bot-id>,
  changeDescription=fieldsDeleted=[
    FieldChange[name=teams,
                oldValue=[{"id":"...","name":"Organization",...}],
                newValue=<null>]]

Repeated for profiler-bot, governance-bot, usage-bot, ingestion-bot, … each one taking ~100–200 ms plus a team-side reindex. We also saw Circular dependency detected in team hierarchy for team: Organization. Skipping to prevent StackOverflowError. from SubjectContext during the same window — the boot-time team churn was tripping the cycle guard.

In a real environment with several bots this added ~3 minutes to every boot.

Fix

Short-circuit addOrUpdateBotUser when the incoming bot has no real change vs. the persisted row: compare description, displayName, and roles. If they all match, return the original user and skip the PUT entirely — no UserUpdater, no team strip, no version bump, no reindex.

Two small adjustments to make the guard actually fire:

retrieveWithAuthMechanism now also loads \"roles\" (was loading only \"authenticationMechanism\"). description and displayName are scalar JSON-column fields and were already populated by the base read.
Compare roles via listOrEmpty(...) on both sides because the database-loaded original returns an empty list while the freshly built in-memory user returns null, and Objects.equals(null, []) is false.

The first call still hits the existing code path (no originalUser -> guard skipped), so seeding new bots is unchanged.

Reproducer

Boot OpenMetadata. The standard set of bots (ingestion-bot, profiler-bot, governance-bot, usage-bot, …) is upserted by BotResource.initialize().
As an admin, assign one of those bots to a real (non-Organization) team.
Restart OpenMetadata.
Observed (before this fix): the bot is no longer in that team, only Organization. Boot logs show fieldsDeleted=[name=teams, oldValue=[...], newValue=null] for every bot, each followed by user + team ES reindex log lines.
After this fix: the bot still belongs to the assigned team after the restart. None of those fieldsDeleted=[teams] log lines appear. Boot time drops accordingly (~3 minutes saved in production).

Test plan

Added openmetadata-service/src/test/java/org/openmetadata/service/util/UserUtilBotTest with two Mockito unit tests:

addOrUpdateBotUserShortCircuitsWhenNothingChanged — mocks UserRepository, stubs getByName to return a stored bot whose roles/description/displayName match the incoming user, calls addOrUpdateBotUser(boundary), asserts the return is the same instance as the stored user, and verifies userRepository.createOrUpdate(...) was never called. Verified to fail before the fix (test reaches UserUtil.addOrUpdateUser and explodes on the unstubbed createOrUpdate); passes after the fix.
addOrUpdateBotUserGoesThroughUpsertWhenDisplayNameChanged — same setup but with mismatching displayName; verifies userRepository.createOrUpdate(...) is called, so we don't accidentally short-circuit on real changes.

Local manual verification: spun the fix into the 1.12.7 backport branch we run in production, restarted, observed no fieldsDeleted=[teams] log entries for the bot users and no follow-up bot/team reindex log lines. Boot duration dropped by ~3 minutes.

Opening as draft for maintainer feedback on:

whether the comparison set (roles, description, displayName) is acceptable or you'd prefer a broader/narrower field check;
the chosen test style (Mockito unit test) vs. a different pattern you'd rather see.

🤖 Generated with Claude Code

github-actions · 2026-05-14T20:21:45Z

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

joaopamaral · 2026-05-18T14:26:53Z

Addressed all four bot review comments in 71c6156a72:

Imports — added AuthenticationMechanism / AuthType imports, dropped the inline FQNs at the old lines 66-67 and 108-109.
Swallowed RuntimeException — replaced the catch (RuntimeException ignored) flow-control pattern with a real createOrUpdate stub that returns a PutResponse so addOrUpdateUser completes normally and the assertNotEquals(stored, result, ...) assertion does the work.
Unused roleRef — removed.
email not in short-circuit comparison — added Objects.equals(originalUser.getEmail(), user.getEmail()) to the guard with a short comment explaining the case (an admin changing principalDomain between restarts → UserUtil.user(...) derives a different email → must not be skipped). Both unit tests now set the email on stored and incoming and still pass; without the fix on the underlying method the test still fails (verified by reverting UserUtil.java to upstream/main, running the test → 1 error; restoring → both green).

github-actions · 2026-05-18T14:27:10Z

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

github-actions · 2026-05-18T14:37:09Z

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

github-actions · 2026-05-18T14:38:35Z

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Copilot

Pull request overview

This PR prevents a boot-time loop where bot users are upserted on every restart, which (because the in-memory bot User lacks teams) triggers updateTeams(...) to remove existing bot team memberships and causes repeated version bumps + search reindex storms. The fix adds a no-op short-circuit in UserUtil.addOrUpdateBotUser(...), and adds unit tests to ensure the short-circuit behavior.

Changes:

Add a short-circuit in UserUtil.addOrUpdateBotUser to skip createOrUpdate when key bot fields are unchanged.
Ensure retrieveWithAuthMechanism also loads roles so the short-circuit can compare them.
Add Mockito-based unit tests covering both the no-op path and the upsert path.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
openmetadata-service/src/main/java/org/openmetadata/service/util/UserUtil.java	Adds the short-circuit guard and expands the fields loaded for bot comparison.
openmetadata-service/src/test/java/org/openmetadata/service/util/UserUtilBotTest.java	Adds unit tests validating short-circuit vs. upsert behavior.

github-actions · 2026-05-18T14:56:38Z

The Java checkstyle failed.

Please run mvn spotless:apply in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Java code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

github-actions · 2026-05-18T17:19:46Z

🟡 Playwright Results — all passed (14 flaky)

✅ 4056 passed · ❌ 0 failed · 🟡 14 flaky · ⏭️ 103 skipped

Shard	Passed	Flaky	Skipped
🟡 Shard 1	298	1	4
🟡 Shard 2	743	7	25
🟡 Shard 3	782	2	7
🟡 Shard 4	788	2	18
✅ Shard 5	709	0	41
🟡 Shard 6	736	2	8

🟡 14 flaky test(s) (passed on retry)

Features/TeamsHierarchy.spec.ts › Delete Parent Team (shard 1, 1 retry)
Features/BulkEditEntity.spec.ts › Glossary (shard 2, 1 retry)
Features/BulkImport.spec.ts › Table (shard 2, 1 retry)
Features/Glossary/GlossaryWorkflow.spec.ts › should start term as Draft when glossary has reviewers (shard 2, 1 retry)
Features/KnowledgeCenterList.spec.ts › Knowledge Center List - Test infinite scroll/pagination (shard 2, 1 retry)
Features/KnowledgeCenterTextEditor.spec.ts › Rich Text Editor - Text Formatting (shard 2, 1 retry)
Features/KnowledgeCenterTextEditor.spec.ts › Rich Text Editor - Text Formatting (shard 2, 1 retry)
Features/KnowledgeCenterTextEditor.spec.ts › Rich Text Editor - Text Formatting (shard 2, 1 retry)
Features/Table.spec.ts › Table search with sorting should work (shard 3, 1 retry)
Flow/SchemaTable.spec.ts › schema table test (shard 3, 1 retry)
Pages/CustomProperties.spec.ts › Set & Update all CP types on apiCollection (shard 4, 1 retry)
Pages/DataContracts.spec.ts › Create Data Contract and validate for Worksheet (shard 4, 1 retry)
Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)
Pages/ServiceEntity.spec.ts › Announcement create, edit & delete (shard 6, 1 retry)

📦 Download artifacts

How to debug locally

# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

joaopamaral · 2026-05-18T19:07:57Z

Checking failing tests 👀

…reindex loop on boot `BotResource.initialize()` runs `UserUtil.addOrUpdateBotUser(user)` for every bot on every OM boot. The in-memory `User` built by `UserUtil.user(...)` does not have the `teams` field populated, so the PUT path through `userRepository.createOrUpdate -> UserUpdater.entitySpecificUpdate` runs `updateTeams(original, updated)` with `original.teams = [Organization]` (or the bot's real stored teams) and `updated.teams = null`. `updateTeams` then executes `deleteTo(user, HAS, TEAM) + assignTeams(null)`, which strips every stored team membership the bot had, bumps the user version, and triggers an Elasticsearch reindex of both the user and each affected team. With several bots this fires on every restart and produces the reindex storm plus "Circular dependency detected in team hierarchy for team: Organization" warnings in the boot logs. In one production deployment this added almost 3 minutes to every boot. Short-circuit when the incoming bot has no real change vs. the persisted row: compare `description`, `displayName`, and `roles`. If they all match, return the original user and skip the PUT entirely — no `UserUpdater`, no team strip, no version bump, no reindex. Two adjustments to make the guard actually fire: - `retrieveWithAuthMechanism` now also loads `"roles"` (was loading only `"authenticationMechanism"`); `description` and `displayName` are scalar JSON-column fields and were already populated by the base read. - Compare `roles` via `listOrEmpty(...)` on both sides because the database-loaded original returns an empty list while the freshly built in-memory user returns null, and `Objects.equals(null, [])` is false. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

gitar-bot · 2026-05-18T19:23:01Z

+    // Short-circuit when the incoming bot user has no real change vs. what's already in the
+    // database. Without this guard every OM boot calls into addOrUpdateUser ->
+    // userRepository.createOrUpdate, and UserUpdater.entitySpecificUpdate then runs
+    // updateTeams/updateRoles/etc. with the incoming `user.getTeams() == null`, which strips
+    // the bot's stored team relationships, bumps the version, and triggers an Elasticsearch
+    // reindex of the bot user (and any team membership change ripples into the team_search
+    // index too). With many bots this is a reindex storm on every restart.


💡 Quality: Overly verbose inline comment block in production code

The 7-line comment block (lines 335–341) explains the why behind the entire PR rather than documenting non-obvious behaviour at the call site. The PR description already captures the rationale. Per the project's "write self-documenting code; only document complex business logic or workarounds" guideline, this would be cleaner as a 1–2 line comment (e.g., // Skip upsert when nothing changed — avoids stripping teams/triggering reindex on boot). The rest is commit-message / PR-description material.

_{Was this helpful? React with 👍 / 👎}

gitar-bot · 2026-05-18T19:23:07Z

Code Review 👍 Approved with suggestions 5 resolved / 6 findings

Prevents redundant bot upserts during boot by adding a short-circuit comparison for bot attributes, successfully resolving the team-strip and reindexing loop. Please consider condensing the verbose inline comment block in the production logic.

💡 Quality: Overly verbose inline comment block in production code

📄 openmetadata-service/src/main/java/org/openmetadata/service/util/UserUtil.java:335-341

The 7-line comment block (lines 335–341) explains the why behind the entire PR rather than documenting non-obvious behaviour at the call site. The PR description already captures the rationale. Per the project's "write self-documenting code; only document complex business logic or workarounds" guideline, this would be cleaner as a 1–2 line comment (e.g., // Skip upsert when nothing changed — avoids stripping teams/triggering reindex on boot). The rest is commit-message / PR-description material.

✅ 5 resolved

✅ Quality: Fully qualified class names used instead of imports

📄 openmetadata-service/src/test/java/org/openmetadata/service/util/UserUtilBotTest.java:66-67 📄 openmetadata-service/src/test/java/org/openmetadata/service/util/UserUtilBotTest.java:108-109
The test uses fully qualified names org.openmetadata.schema.entity.teams.AuthenticationMechanism inline (lines 66-67 and 108-109) instead of importing the class. Per project conventions, wildcard imports and fully qualified names should be avoided — add a proper import statement instead.

✅ Quality: Swallowed RuntimeException uses flow-control exception pattern

📄 openmetadata-service/src/test/java/org/openmetadata/service/util/UserUtilBotTest.java:125-135
The test at line 131 catches RuntimeException ignored to handle the case where the mock doesn't fully stub the downstream path. This is a flow-control exception anti-pattern and makes the test fragile — if addOrUpdateBotUser throws for an unexpected reason, the test silently passes. Consider stubbing createOrUpdate to return a value (or use doNothing()/doReturn(...)) so the method completes normally, and assert on the result instead.

✅ Quality: Unused private helper method roleRef should be removed

📄 openmetadata-service/src/test/java/org/openmetadata/service/util/UserUtilBotTest.java:140-145
The private method roleRef at lines 140-145 is annotated with @SuppressWarnings("unused") and is never called. Commented-out or dead code should be removed per project conventions. If it's intended for future tests, add it when those tests are written.

✅ Edge Case: Short-circuit doesn't compare email field changes

📄 openmetadata-service/src/main/java/org/openmetadata/service/util/UserUtil.java:342-346
The guard at line 342-346 compares roles, description, and displayName but not email. The UserUtil.user(...) method (which builds the in-memory bot user) may set an email based on domain. If an admin changes the domain configuration between restarts, the email update would be silently skipped by the short-circuit. Consider whether email should be included in the comparison, or document why it's excluded.

✅ Quality: Fully qualified class names instead of imports in test

📄 openmetadata-service/src/test/java/org/openmetadata/service/util/UserUtilBotTest.java:128-130
Lines 128-130 of UserUtilBotTest.java use fully qualified class names (jakarta.ws.rs.core.Response.Status.OK and org.openmetadata.schema.type.EventType.ENTITY_UPDATED) inline instead of importing them. Per project conventions, wildcard and fully qualified names should be avoided — add proper imports at the top of the file.

🤖 Prompt for agents

Code Review: Prevents redundant bot upserts during boot by adding a short-circuit comparison for bot attributes, successfully resolving the team-strip and reindexing loop. Please consider condensing the verbose inline comment block in the production logic.

1. 💡 Quality: Overly verbose inline comment block in production code
   Files: openmetadata-service/src/main/java/org/openmetadata/service/util/UserUtil.java:335-341

   The 7-line comment block (lines 335–341) explains the *why* behind the entire PR rather than documenting non-obvious behaviour at the call site. The PR description already captures the rationale. Per the project's "write self-documenting code; only document complex business logic or workarounds" guideline, this would be cleaner as a 1–2 line comment (e.g., `// Skip upsert when nothing changed — avoids stripping teams/triggering reindex on boot`). The rest is commit-message / PR-description material.

Options

Display: compact → Showing less information.

Comment with these commands to change:

`Compact`
`gitar display:verbose`

_{Was this helpful? React with 👍 / 👎 | Gitar}

sonarqubecloud · 2026-05-18T23:31:47Z

Quality Gate passed for 'open-metadata-ingestion'

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

gitar-bot Bot reviewed May 14, 2026

View reviewed changes

Comment thread openmetadata-service/src/test/java/org/openmetadata/service/util/UserUtilBotTest.java Outdated

gitar-bot Bot reviewed May 14, 2026

View reviewed changes

Comment thread openmetadata-service/src/test/java/org/openmetadata/service/util/UserUtilBotTest.java Outdated

gitar-bot Bot reviewed May 14, 2026

View reviewed changes

Comment thread openmetadata-service/src/test/java/org/openmetadata/service/util/UserUtilBotTest.java Outdated

gitar-bot Bot reviewed May 14, 2026

View reviewed changes

Comment thread openmetadata-service/src/main/java/org/openmetadata/service/util/UserUtil.java

joaopamaral force-pushed the bot-team-strip-fix branch from f5365ec to 71c6156 Compare May 18, 2026 14:26

gitar-bot Bot reviewed May 18, 2026

View reviewed changes

Comment thread openmetadata-service/src/test/java/org/openmetadata/service/util/UserUtilBotTest.java Outdated

joaopamaral force-pushed the bot-team-strip-fix branch from 71c6156 to c85a520 Compare May 18, 2026 14:36

joaopamaral marked this pull request as ready for review May 18, 2026 14:38

Copilot AI review requested due to automatic review settings May 18, 2026 14:38

Copilot started reviewing on behalf of joaopamaral May 18, 2026 14:38 View session

Copilot AI reviewed May 18, 2026

View reviewed changes

Comment thread openmetadata-service/src/main/java/org/openmetadata/service/util/UserUtil.java

Comment thread openmetadata-service/src/main/java/org/openmetadata/service/util/UserUtil.java

yan-3005 previously approved these changes May 18, 2026

View reviewed changes

yan-3005 added the safe to test Add this label to run secure Github workflows on PRs label May 18, 2026

yan-3005 temporarily deployed to test May 18, 2026 15:03 — with GitHub Actions Inactive

joaopamaral dismissed yan-3005’s stale review via 652e6e8 May 18, 2026 19:21

joaopamaral force-pushed the bot-team-strip-fix branch from 2b464bb to 652e6e8 Compare May 18, 2026 19:21

gitar-bot Bot reviewed May 18, 2026

View reviewed changes

joaopamaral temporarily deployed to test May 18, 2026 19:32 — with GitHub Actions Inactive

Merge branch 'main' into bot-team-strip-fix

514f0db

joaopamaral temporarily deployed to test May 18, 2026 22:31 — with GitHub Actions Inactive

joaopamaral requested a deployment to test May 18, 2026 22:31 — with GitHub Actions In progress

Conversation

joaopamaral commented May 14, 2026

Summary

Fix

Reproducer

Test plan

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

joaopamaral commented May 18, 2026

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

github-actions Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🟡 Playwright Results — all passed (14 flaky)

Uh oh!

joaopamaral commented May 18, 2026

Uh oh!

gitar-bot Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

gitar-bot Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sonarqubecloud Bot commented May 18, 2026

Quality Gate passed for 'open-metadata-ingestion'

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions Bot commented May 18, 2026 •

edited

Loading

gitar-bot Bot commented May 18, 2026 •

edited

Loading